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BACKGROUND 

A variety of systems are used for authoring multimedia presentations such as motion 
pictures, television shows, advertisements for television, presentations on digital versatile 
disks (DVDs), interactive hypermedia, and other presentations. Such authoring systems 
generally provide a user interface and a process through which multimedia data is captured 
and stored, and through which the multimedia presentation is created, reviewed and published 
for distribution. The user interface and process for authoring generally depend on the kind of 
presentation being created and what the system developer believes is intuitive and enables an 
author to work creatively, flexibly and quickly. 

Some multimedia presentations are primarily nontemporal presentations. That is, any 
change in the presentation typically depends on user activity or other event, instead of the 
passage of time. Some nontemporal multimedia presentations may include temporal 
components. For example, a user may cause a video to be displayed that is related to a text 
document by selecting a hyperlink to the video in the document. 

Other multimedia presentations are primarily temporal presentations incorporating 
audio and/or video material, and optionally other media related to the temporal media. 
Primarily temporal media presentations that are well known today include streaming media 
formats such as QuickTime, Real Media, Windows Media Technology and SMIL, and 
formats that encode data in the vertical blanking interval of a television signal, such as used 
by WebTV, ATVEF, and other similar formats. 

A variety of authoring tools have been developed for different kinds of presentations. 
Tools for processing combined temporal and nontemporal media include those described in 
PCT Publication No. WO99/52045, corresponding to U.S. Patent Application Serial No. 
09/054,861, and PCT Publication No. W096/31829, corresponding to U.S. Patent 
Application Serial No. 08/417,974, and U.S. Patent 5,659,793 and U.S. Patent 5,428,731. 

SUMMARY 

An authoring tool has a graphical user interface enabling interactive authoring of a 
multimedia presentation including temporal and nontemporal media. The graphical user 
interface enables specification of the temporal and spatial relationships among the media and 



playback of the presentation with the specified temporal and spatial relationships. The spatial 
and temporal relationships among the media may be changed independently of each other. 
The presentation may be viewed interactively under the control of the author during the 
authoring process without encoding the audio and video data into a streaming media data file 
for combination with the other media, simulating behavior of a browser that would receive a 
streaming media data file. The multimedia presentation may include elements that initiate 
playback of the presentation from a specified point in time. After authoring of the 
presentation is completed, the authoring tool assists in encoding and transferring the 
presentation for distribution. Information about the distribution format and location may be 
stored as user-defined profiles. Communication with the distribution location may be tested 
and presentation and the distribution information may be audited prior to encoding and 
transfer to reduce errors. A presentation is encoded according to the defined temporal and 
spatial relationships and the distribution format and location information to produce and 
encoded presentation. The encoded presentation and any supporting media data are 
transferred to the distribution location, such as a server. A streaming media server may be 
used for streaming media, whereas other data may be stored on a conventional data server. 
Accounts may be provided for a streaming media server for authors to publish their 
presentations. The authoring tool may be associated with a service that uses the streaming 
media server. Such streaming media servers also may be a source of stock footage for use by 
authors. These various functions, and combinations thereof, of the authoring tool are each 
aspects of the present invention that may be embodied as a computer system, a computer 
program product or a computer implemented process that provides these functions. 

In one embodiment, the spatial relationship may be defined by a layout specification 
that indicates an association of one or more tracks of temporal media and one or more tracks 
of nontemporal media with a corresponding display location. If the temporal media is not 
visible, such as audio, the spatial relationship may be defined among the nontemporal media. 

One kind of temporal relationship between nontemporal data and temporal media is 
provided by a table of contents track. The nontemporal media of elements associated with 
points in time in the table of contents track of a presentation is combined and displayed for 
the duration of the presentation. If a user selects one of the elements from the table of 
contents track, presentation of the temporal media data is initiated from the point in time 
associated with that element on the table of contents track. 
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It is also possible to associate a streaming media presentation with another streaming 
media presentation. For example, an event in one streaming media presentation may be used 
to initiate playback of another subsequent streaming media presentation. The two 
presentations may have different layout specifications. A document in a markup language 
5 may be created to include a hyperlink to each of the plurality of streaming media 
presentations. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is an illustration of an example multimedia presentation; 
10 Fig. 2 is an illustration of a relationship among multiple presentations; 

Fig. 3 is an illustration of a timeline for defining a multimedia presentation; 
O Fig. 4 illustrates example layouts for a multimedia presentation; 

(fi Fig. 5 is an illustration of an example graphical user interface for specifying a layout; 

^ Fig. 6 is an illustration of an example graphical user interface for specifying a 

SI 15 mapping between frames in a layout and tracks in a timeline; 

5 Fig. 7 A is a data flow diagram illustrating a relationship of parts of a system for 

^ authoring and publishing a multimedia presentation; 

UJ Fig. 7B is an illustration of an example graphical user interface for interactively 

ill 

£1 authoring and viewing a presentation; 

B 20 Fig. 8A illustrates an architecture for implementing an editing viewer of Fig. 7A; 

O 

Fig. 8B illustrates an architecture for implementing a display manager of Fig. 8 A; 

Fig. 8C is a flowchart describing how a graphical user interface may be constructed; 

Fig. 8D is a flowchart describing how a display manager may display contents and its 
corresponding portion of the editing interface; 
25 Fig. 8E is a flowchart describing how the table of contents display may be updated; 

Fig. 8F is a flowchart describing how a new table of contents file may be generated; 

Fig. 9 is a flowchart describing how a presentation may be a published; 

Fig. 10 illustrates a graphical user interface for managing a transfer process of a 
multimedia presentation; 
30 Fig. 1 1 A is a flowchart describing how a presentation may be encoded; 

Fig. 1 IB is a flowchart describing, in one implementation, how a program may be 

encoded; 



Fig. 1 1C is a flowchart describing how a presentation may be transferred; 

Fig. 12 is a data flow diagram illustrating interaction of a transfer tool with a 
streaming server and a data server; and 

Fig. 1 3 is a data flow diagram illustrating a relationship of multiple editing and 
transfer systems with a streaming server. 

DETAILED DESCRIPTION 

In this description, all patent applications and published patent documents referred to 
herein are hereby incorporated by reference. 

Referring to Fig. 1 , an example of a multimedia presentation, which may be created 
using an authoring system to be described herein, will now be described. In general, a 
multimedia presentation is a combination of temporal media, such as video, audio and 
computer-generated animation, and nontemporal media, such as still images, text, hypertext 
documents, etc. Some temporal media, such as animations in the GIF format or the 
Macromedia Flash formats may be used as if they were nontemporal media. The temporal 
and nontemporal media may be combined in many different ways. For example, a 
multimedia presentation may include audio and/or video combined with multimedia slides 
that are time synchronized with the audio and/or video. The presentation also may include 
advertisements and/or an index of the temporal media. In general, there is a temporal 
relationship and a spatial relationship among the temporal and nontemporal media. In some 
presentations, only a temporal relationship exists between certain temporal media, such as 
audio, and the nontemporal media. An example presentation shown in Fig. 1 , includes video 
100, HTML events 102, a table of contents 104, and an advertisement 106. 

Fig. 2 illustrates a more complex multimedia presentation format. This multimedia 
presentation includes a hypermedia document 200, for example in a markup language, 
including hyperlinks to one or more streaming media presentations, as indicated at 202, 204, 
and 206. Upon selection of a hyperlink, the corresponding streaming multimedia 
presentation 208, 210 or 212 may be played. An event at or near the end of a streaming 
multimedia presentation may be used to initiate playback of the subsequent multimedia 
presentation. The different presentations may have different specified spatial relationships. 

There are many ways in which such multimedia presentations may be stored. For 
example, various streaming media formats, such as Real Media, Microsoft Windows Media 



Technology, QuickTime and SMIL, may be used. The temporal media also may be encoded 
in a television signal, with nontemporal media encoded in a vertical -blanking interval of the 
television signal, such as used by WebTV, ATVEF and other formats. 

Creating such a multimedia presentation involves creating a temporal relationship 
between each element of nontemporal media and the temporal media. Such a relationship 
may be visualized using a timeline, an example of which is shown in Fig. 3. In general, a 
timeline has one or more tracks of temporal media, and one or more tracks of nontemporal 
media. For example, there may be one video track, one audio track, and an event track. The 
presentation of the media on all the tracks is synchronized by the positions of the elements in 
the timeline. These positions may be specified graphically through a graphical user interface. 
Various data structures may be used to represent such a timeline, such as those described in 
U.S. Patent 5,584,006 (Reber), U.S. Patent 5,724,605 (Wissner) and PCT Publication No. 
WO98/05034. 

The timeline is a time based representation of a composition. The horizontal 
dimension represents time, and the vertical dimension represents the tracks of the 
composition. Each track has a row in the timeline which it occupies. The size of a displayed 
element in a graphical user interface is determined as a function of the duration of the 
segment it represents and a timeline scale. Each element in each track of the timeline has a 
position (determined by its start time within the presentation), a title and associated data and 
optionally a duration. 

Fig. 3 illustrates an example timeline which includes two audio tracks 300, two video 
tracks 302, two event tracks 304, a title track 306, and a table of contents track 308. Each of 
these tracks will now be described. 

An audio track 300 or a video track 302 is for placement of temporal media. Such 
tracks commonly are used in video editing applications, such as shown in PCT Publication 
No. WO98/05034, which corresponds to U.S. Patent Application Serial Nos. 08/687,926 and 
08/691,985. Similarly, a title track 306 commonly is used to create title effects for movies, 
such as scrolling credits. As such, titles commonly are considered temporal media because 
they have parameters that are animated over time and that are combined with video data. 
Each track supports defining a sequence of segments of media data. A segment references, 
either directly or indirectly, the media data for the segment. 



In the timeline shown herein, event tracks 304 associate nontemporal media with a 
particular point in time, thus creating a temporal relationship with the temporal media in 
tracks 300, 302, and 306. Each event track is a list of events. Each event includes a time and 
references a data file or a uniform resource locator, either directly or indirectly, from which 
media data for the event may be received. 

The table of contents track 308 associates a table of contents entry with a point in 
time. The table of contents may be used as an index to the temporal media. Each entry 
includes a time and associated content, typically text, entered by the author. As described in 
more detail below, the table of contents entries are combined into a single document for 
display. If a user selects an element in the table of contents as displayed, the presentation is 
displayed starting at the point in time corresponding to the selected element. 

The spatial relationship of the elements in the timeline as presented also may be 
specified by the author. In one simple example, a layout specification indicates a 
combination of frames of a display area, of which one or more frames is associated to one or 
more of the tracks in the timeline. Some tracks might not be associated with a display frame. 
Some frames might be associated directly with static media and not with a track. In general a 
frame is associated with only one track and a track is associated with only one frame. 

The possible combinations and arrangements of the various tracks in a timeline are 
unlimited, and are not limited to visual media. As shown in the examples in Fig. 4, the visual 
display may be merely a table of contents 400, or an event track 402, or both 404, for 
example, in combination with audio. These examples are merely illustrative. In some cases, 
the audio has a corresponding visual component that may be displayed, such as volume and 
position controls. Video may be displayed, for example, with an event track 406, or a table 
of contents track 408, or both 410, such as shown in Fig. 4. 

A graphical user interface, and example of which is described in connection with Fig. 
5, enables a user to select from among several layout specifications that have been stored as 
templates. A graphical user interface, an example of which is described in connection with 
Fig. 6, enables an author to make assignments between tracks in the timeline and frames in 
the display. 

In Fig. 5, a graphical user interface 500 illustrates templates in a template window 
502. A template defines a mapping between frames and tracks and a display arrangement of 
the frames such as described in Fig. 4. A selected template such as 504 is viewed in a 



preview pane 506. A user may browse the file system to identify other templates by selecting 
a button 508 as in conventional user interfaces. A template may be defined using the 
hypertext markup language (HTML), for example by using frame set definitions. A template 
may be authored using any conventional HTML authoring tool, word processor or text editor. 
In the user interface, a template file may be accessed to determine its frame set definitions to 
generate an appropriate icon for display. Similarly, the preview pane 506 is generated by 
accessing the frame set definition within the selected template file. The mapping between 
frames and tracks also is stored in the template file. 
An example template file follows: 



<HTML> 

<AVIDPUB tagtype="f ramemap" f ramename= " Fr ame_A" f eature="MOVIE" 
originalurl="static . htm"> 

<AVIDPUB tagtype="f ramemap" f ramename="Frame_B" f eature="EVENTTRACK" 
f eaturenum=" 1 " > 

<AVIDPUB tagtype="framemap" f ramename="Frame_C" f eature="EVENTTRACK" 
f eaturenum="2" > 

<AVIDPUB tagtype="f ramemap" framename="Frame_D" f eature="TOC" 
originalurl="static . htm"> 

<AVIDPUB tagtype="framemap" f ramename="Frame_E" f eature="EVENTTRACK" 
f eaturenum="3" > 

<AVIDPUB tagtype="f ramemap" f ramename="Frame_Top" f eat ure=" STAT ICHTML" 
f eaturenum="0" originalurl="header . htm"> 

<FRAMESET cols="40% , 60% " bordercolor="blue" f rameborder=yes framespacing=2> 
< FRAMESET rows="70 , 4 0% , * "> 

< FRAME SRC="header .htm" name="Frame_Top"> 

< FRAME SRC="AvidVid.htm" name="Frame_A"> 

< FRAME SRC= ,, AvidPubToc.html" name=" Frame_D"> 
</FRAMESET> 

< FRAMESET rows="33% , 34%, * "> 

< FRAME SRC="static.htm" name="Frame_B"> 
< FRAME SRC="static.htm" name="Frarae_C"> 
< FRAME SRO"static.htm" name="Frame_E"> 
</ FRAMES ET> 
</FRAMESET> 
</HTML> 



The first few lines of this template include "< AVIDPUB >" HTML elements. These 
elements keep track of the mappings between frames and tracks. Following these elements, a 
frame set definition is provided using the U <FRAMESETS>" element. Each frame has a 
source file name (SRC = "filename") and a name (name = "name") associated with it. Each 
< AVIDPUB > element maps a frame name to a "feature," which is a name of a type of a 
track, and a feature number, indicative of which of the number of tracks of that type is 
mapped to the frame. 

A template may include other content and structure beyond that shown in the 
example. For example, a company may want all of its presentations to use the same logo in 
the same position. This consistency may be provided by adding a reference to the logo to the 
template. 

By selecting the next button 510 in Fig. 5, the mapping between frames and tracks 
may be defined. A user interface such as shown in Fig. 6 is then displayed. The system uses 
the template HTML file to generate a view 600. Also, the frame names are extracted from 
the selected template and are listed in a region 602. The available tracks for a presentation 
are accessed, possibly using the timeline, to generate menus such as indicated at 604. The 
name of each track is put into a menu associated with each frame name to enable a user to 
select that track and associate it with the corresponding frame. If a track is associated with a 
frame, the < AVIDPUB > element for that frame has its feature attribute modified to indicate 
the track is associated with that frame. A check may be performed to ensure that a track is 
not associated with more than one frame. 

In this and other processes described below in which an HTML file is read and 
accessed, an application programming interface provided by the Microsoft Corporation used 
may be to read and write data in HTML files. 

Having now described examples of data structures for timelines and layout 
specifications how they may be defined and how they may be associated with each other, 
authoring and publishing of such presentations will now be described. 

Fig. 7A is a data flow diagram illustrating a relationship of parts of a system for 
authoring and publishing a multimedia presentation. Using an editing graphical user 
interface (GUI) 700 described below with Fig. 7B and a layout GUI 702, described above 
with Figs. 6A and 6B, timeline activity 704 and a layout specification 706 are defined. This 
data is provided to an editing manager 708 to enable viewing of the presentation during 



editing. The editing manager, given a point in time 722 on the timeline and optionally a 
playback rate 724 form the editing GUI 700, generates video data 714 and other visible data 
710 for display in the editing GUI 700, in an arrangement defined by the layout specification 
706, using media files 712. An example implementation of the editing manager is described 
below in connection Figs. 8A-F. After the author has completed creating the presentation, 
the publisher 718 is invoked to process the timeline 716, layout specification 706, and media 
file 712 to generate the published presentation 720. 

An example GUI for the editing GUI of Fig. 7A will now be described in connection 
with Fig. 7B. In Fig. 7B, the timeline region 700 includes an index track 702, a video track 
704, a titles track 706, two audio tracks 708 and 710, three event tracks 712, 714 and 716 and 
the timeline scale 718. The timeline scale determines the number of pixels that represents a 
time unit. Increasing or decreasing this time scale allows the user to focus on a particular 
location in the composition, or to have a more of an overview of the composition. A viewer 
window 720 displays the video data and other visual information. A display controller 722 
includes a position indicator 724 which points to the present position within the multimedia 
presentation which is being viewed. Forward and backward skip buttons 726 and 728 and 
play buttons 730 also may be provided. The position indicator 724 is associated with a 
position indicator 736 in the timeline 700. The buttons 726, 728 and 730, and position 
indicator 724 may be used to control the viewing of the multimedia presentation during 
authoring. Frame boundaries, as indicated at 732 and 734, to the frame set definitions in the 
layout specification. The frame boundaries 732 and 734 may be made adjustable using a 
cursor positioning device, such as a mouse or touchpad. Such adjustments may be 
transformed into edits of the layout specification. The various kinds of operations that may 
be performed to edit the audio and video and to add titles are described in more detail in PCT 
Publication No. WO98/05034. 

How entries in the index or table of contents track 702 and event tracks 712 through 
716 are added or modified will now be described. A region 740 illustrates available 
multimedia data for insertion into events. Buttons 742, 744 and 746 enable different views of 
the information presented in region 740. Button 742 selects a mode in which the system 
displays a picture of the data. Button 744 selects a mode in which the system displays a 
detailed list including a small picture, filename, and timestamp of the data file or resource. 
Button 746 selects a mode in which the system displays only titles. Other modes are possible 
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and the invention is not limited to these. The names displayed are for those files found in the 
currently active path in the file system used by the authoring tool or other resources available 
to the system. The list operation, for example, may involve a directory lookup performed by 
the computer on its file system. A user may select an indicated data file or resource and drag 
its icon to an event timeline either to create a new event, or to replace media in an existing 
event, or to add media to an existing event. 

On the event timeline, an event 750 indicates a data file or other resource associated 
with a particular point in time. Event 752 indicates that no file or resource is associated with 
the event at this time. In response to a user selection of a point on an event track, a new event 
may be created, if one is not already there, or the selected event may be opened. Whether a 
new event is created, or an existing event is opened, the user may be presented with a 
properties dialog box to enable entry of information, such as a name for the event, or a file 
name or resource locator for the associated media, for storage into the event data structure. 
An event that is created may be empty, i.e., might not refer to any data file or resource. 

The elements on the event track may be illustrated as having a width corresponding to 
the amount of time it would take to download the data file over a specified network 
connection. To achieve this kind of display, the number of bytes of a data file is divided by 
the byte-per-second rate of the network connection to determine a time value, in seconds, 
which is used to determine the width of the icon for the event to be displayed on the event 
track. Displaying the temporal width of an object provides information to the author about 
whether enough time is available at the location of distribution to download the data and to 
display the data at the desired time. 

Similar to the events, a user may select an element on the table of contents track as 
indicated at 754. An item may be added by selecting a point on the table of contents track 
with a cursor control device. Upon selection, a dialog window is displayed through which 
the user may enter text for the selected element. Each of the elements in the table of contents 
track 702 is displayed in the frame 756 in the viewer 720. 

To display the presentation to the author, for a given point in time of the presentation, 
the system determines which contents should be displayed. In the example shown in Fig. 7B, 
event 758 is currently being displayed from the event track in viewer frame 760. The video is 
being shown in frame 762. The table of contents elements are shown in frame 756. A viewer 
such as shown in Figs. 7A and 7B may be implemented in many ways, depending on the 



availability of preexisting program components to be used, and the platform on which the 
viewer is implemented. An example implementation will now be described in connection 
Figs. 8A through 8E for use with a platform as specified below. In this implementation, the 
viewer uses an Internet Explorer browser component, available from Microsoft Corporation, 
to render the nontemporal media. Currently available browser components are capable of 
processing encoded streaming media files but not video and audio data defined using a 
timeline. Thus, the temporal media, in particular the audio and video, is rendered in a manner 
typical in video editing systems, such as described in PCT Publication No. WO98/05034. 
The viewer described herein reads a presentation and accesses data, audio and video files to 
produce an the presentation without an encoded streaming media file, thus simulating the 
operation of a browser that uses streaming media files. 

Referring now to Fig. 8A, an architecture for this implementation is illustrated. This 
architecture includes an asset manager 8100 which manages access to data files 8102 used in 
the presentation. A clip manager 8104 maintains the timeline data structure 8106 in response 
to instructions from the user via the graphical user interface. Requests for access to 
information from the timeline 8106 by the presentation manager 8108 and display manager 
8110 also are managed by the clip manager 8 1 04. The presentation manager 8108 maintains 
the layout specification 8112 and other display files 81 14. The other display files include 
files in a markup language that define the table of contents frame and the video frames. An 
example layout was described above in connection with Fig. 6. An example table of contents 
file and example video frame files, for the Real Media and Windows Media technology 
formats, are provided in Appendices I-III, the interrelationship of which will now be 
described. 

There are several ways in which the table of contents may be constructed to allow 
actions on a table of contents entry to cause a change in the playback position in the video 
frame. One example is provided by the source code and Appendices I-III. In the table of 
contents page, a JAVA script function called "seekToEPMarker" takes either a marker 
number (for Windows Media technology) or a time in milliseconds (for Real Media) and calls 
a function "seekToVideoMarker" of its parent frame in its frame set. This function call 
actually calls the JAVA script function of the child frame of the table of contents' parent 
frame that includes the video player. That function receives the marker and the time in 



- 12- 

milliseconds and generates the appropriate commands to the media player to initiate playback 
of the streaming media from the designated position. 

Turning again to Fig. 8 A, the display managers 8110 each are associated with a 
display window in the viewer and control displaying content in their respective windows. In 
general, the display managers access data from the presentation manager 8108 and clip 
manager 8104 to provide data to the graphical user interface 81 16, in response to events that 
modify the timeline or the presentation of data in the timeline as received from the graphical 
user interface or the clip manager. The graphical user interface 8116 communicates with the 
clip manager, presentation manger and display manager to create and maintain the view of 
the timeline and the presentation in response to user inputs. 

A display manager, in one implementation, is described in more detail in connection 
with Fig. 8B. The display manager includes a controller module 8200 which communicates 
with the graphical user interface, presentation manager and clip manager. To display a data 
file, the controller instructs a browser component 8202 to render data for display. The output 
of the browser component is processed by an image scaling module 8204 that scales the 
result to fit within the appropriate display region in the viewer. 

Referring now to Fig. 8C, how the display of the presentation in the viewer may be 
created will now be described. In particular, the layout of the presentation is defined by the 
layout specification 8112. This layout specification is parsed 8300 to generate a tree-like 
representation of the layout. In particular, as shown in the example layout specification 
provided above, some frames are defined as subframes of other frame sets. This hierarchical 
definition of frames translates into a tree-like representation. For each nonleaf node in the 
tree, a splitter window is created 8302 in the presentation display region on the user interface. 
For each leaf node in the tree, a display window is created 8304 within its associated splitter 
window. This display window is instructed 8306 to display its content at time zero, i.e., the 
beginning, in the presentation to initialize the display. The display window has an associated 
display manager 8110. 

How the display manager displays data given a specified time in the presentation will 
now be described in connection with Fig. 8D. In particular, the display manager receives 
8400 a time T. For event tracks, the event that has most recently occurred in the presentation 
prior to time T is identified 8402. The data file for that event is then obtained 8404. The 
browser component is then instructed 8406 to render the received data file. The image 
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scaling module scales the image produced by the browser component, in 8408, which is then 
displayed 8410 in the associated window. For video information, this process involves 
identifying the sample from the data file for the segment that is in the presentation at the 
specified time. This sample is scaled and displayed. Because the table of contents file is not 
time dependent, it is simply rendered, scaled and displayed and step 8402 may be omitted. 

After initialization, each display manager acts as a "listener" process that responds to 
messages from other components, such as the clip manager and graphical user interface, to 
update the display. One kind of update is generated if display controls in the graphical user 
interface are manipulated. For example, a user may modify the position bar on either the 
timeline or the viewer to initiate display from a different point in time T. In response to such 
a change, the graphical user interface or the clip manager may issue a message requesting the 
display managers to update the display given a different time T. Similarly, during editing, 
changes to the timeline data structure at a given point in time T cause the clip manager to 
instruct the display managers to update the display with the new presentation information at 
that point in time T. 

Playback may be implemented using the same display mechanism. During either 
forward or reverse playback at a continuous or user-controlled rate, a stream of instructions to 
update the display at different points in time T may be sent to the display managers. Each 
display manager updates its region of the display at each of the specified times T which it 
receives from the clip manager or graphical user interface. 

Although the table of contents generally is a single file without time dependency, 
during editing it may be modified, after which the display is updated. One implementation 
for modifying the table of contents display will now be described in connection with Figs. 8E 
and 8F. In Fig. 8E, a display manager for the table of contents receives 8500 a message from 
the clip manager that a table of contents entry has been added to the table of contents track. 
The display manager requests 8502 the presentation manager for a new table of contents file. 
After receiving 8504 the indication of the new table of contents file, the browser component 
is instructed 8506 to render the data file. The rendered data file is then scaled 8508 for 
display. 

How the presentation manager generates a new table of contents file is described in 
Fig. 8F. The presentation manager receives 8600 a message requesting a new table of 
contents file. The presentation manager requests 8602 the table of contents track information 



- 14- 

from the clip manager. HTML data is generated 8604 for each table of contents entry. 
Referring to the sample table of contents file in Appendix I, a list of items is created for each 
entry in the table of contents track. The table of contents file is then modified with the newly 
generated HTML, for example, by overwriting the table of contents information in the 
existing table of contents file. Although the identity of the table of contents file is known by 
the display manager, the presentation manager may return the name of the data file to confirm 
completion of the generation of the table of contents. 

In one implementation, the display manager for each frame also may permit display of 
a zoomed version of the frame. In this implementation, selection of a frame for zooming 
causes the graphical user interface to display the data for this frame in the full display region. 
For video and events tracks, the zoom instruction merely changes the image scaling 
performed on the image to be displayed. For the table of contents track, the zoomed version 
may be provided by a display that enables editing of the table of contents. Modifications to 
the entries in the table of contents in the zoomed interface are passed back to the clip 
manager to update the timeline data structures. 

After completing editing of a presentation, it may be published to its desired 
distribution format. A variety of operations maybe performed and assisted by the publishing 
component of this system to prepare a presentation for distribution. Operations that may be 
performed to publish a multimedia presentation will now be described in more detail in 
connection with Fig. 9. 

First, the author provides setup data, which is accepted 900 through a GUI, to define 
the distribution format and other information used to encode and transfer the presentation. 

For example, the selected output format may be a streaming media format, such as 
RealG2, Windows Media Technology, QuickTime or SMIL. Other settings for the encoder 
may include the streaming data file type, the video width, the video height, a title, author, 
copyright and keyword data. 

For transferring the presentation, various information may be used to specify 
characteristics of one or more servers to which the presentation will be sent and any account 
information for those servers. Transfer settings may include a transfer protocol, such as file 
transfer protocol (FTP) or a local or LAN connection, for sending the presentation data files 
to the server. The server name, a directory at the server in which the media files will be 
copied, and optionally a user name and password also may be provided. A default file name 
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for the server, and the HTTP address or URL of the server from which a user will access the 
published presentation, also may be provided. The server information may be separate for 
both data files and streaming media files. 

This encoding and transfer information may be stored by the transfer tool as a named 
profile for later retrieval for transferring other presentations. Such profile data may include, 
for example, the data defining settings for encoding, and the data defining settings for transfer 
of encoded data files. 

When setting up each of the connections for transfer, the connection also may be 
tested to confirm its operation. This test process involves transferring a small file to the 
destination and confirming the ability of the system to read the file from the destination. 

After setup, the presentation may be audited 901 to reduce the number of errors that 
may otherwise result during the encoding and/or transfer processes. Profile information, 
described below, the presentation, and other information may be reviewed for likely sources 
of errors. For example, titles and/or other effects may be checked to determine whether the 
title and/or effect has been rendered. The timeline data structure may be searched to identify 
the data files related to each event, segment, table of contents entry, etc., to determine if any 
file is missing. The events in the timeline may be compared to the video or audio or other 
temporal data track to determine if any events occur after the end of the video or audio or 
other temporal data track. The layout specification also may be compared to the timeline data 
structure to ensure that no events or other data have been defined on tracks that are not 
referred to in the layout specification. Results of these various tests on the layout and 
timeline data structures may be provided to the user. Information about the profile used for 
the transfer process also may be audited. For example, whether passwords might be used on 
the target server, and the other information about the accessibility of the target server may be 
checked. The target directory also may be checked to ensure that no files in the native file 
format of the authoring tool are present in the target directory. Various other tests may be 
performed in an audit process and the invention is not limited thereto. 

After optional auditing, the presentation is encoded 902 by transforming the timeline 
data structures into a format used by a standard encoder, such as provided for the Real Media 
Player or Windows Media Technology. Such encoding is described in more detail below in 
connection with Figs. 1 1 A and 1 IB. The encoded presentation optionally may be previewed 
904. To support preview, during encoding the files used to encode, and that will ultimately 
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be transferred to each server, are collected locally. The presentation may be encoded first to 
support preview by referring to the local files. The files for the presentation then are 
transferred 906 to each server. Before transfer, if the presentation was encoded for local 
preview, the references to local files are translated into references to files on the destination 
servers. For example, the encoded streaming media file generally is provided to a streaming 
media server, whereas other data files referred to by the streaming media file are provided to 
a standard hypertext transfer protocol daemon (HTTPD) or web server. The transfer process 
is described in more detail below in connection with Fig. 1 1C. Finally, the transferred 
presentation may be previewed 908 from the remote site. 

A graphical user interface for facilitating the publishing process described in Fig. 9 
will now be described in connection with Fig. 10. A user may set profile data by selecting 
setup or options 1000. During set up, a profile may be recalled, created or edited, and the 
user may specify the file folder and server on which the presentation will be stored. In 
response to selection of the "do it" menu item, the screen shown in Fig. 10 is displayed. First 
the presentation and profile data are audited. After the auditing step is complete, a 
checkmark appears in an icon 1006. Next, encoding of the presentation may be started at 
1008. A user may optionally select to preview the encoded presentation locally prior to 
transfer. By selecting button 1010, a preview of the presentation may be initiated. After 
preview, the icon 1012 includes a checkmark. During transfer, a user may select to overwrite 
files that have the same name on the destination server, as indicated at 1014. The user may 
initiate the transfer by selecting the button indicated at 1016. After completion, the icon 1018 
includes a checkmark. Finally, after transfer, the user may view the presentation as 
transferred from the destination server by selecting button 1020. 

Referring to Fig. 1 1 A, encoding of a presentation will now be described. In general, 
most encoders have an application programming interface that generate an encoded file in 
response to commands to add samples of media to the presentation. The commands for 
adding samples generally include the type of media, the time in the presentation in which the 
media is to be added and the media data itself as inputs to the command. The sample for 
video data is usually a frame. The sample of audio data is usually several samples defining a 
fraction of a second. The data also may be, for example, a uniform resource locator (URL) or 
other data. 
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More particularly, an API has functions that: 1) enable opening the component, 2) 
optionally present the user with a dialog box interface to configure the component, 3) set 
settings of the component that control its behavior, 4) connect the component to a user visible 
progress bar and to the source of the data, 5) to initiate the component to start translating the 
data into the desired format, 6) write the desired format to a file, and 7) close the component 
if the process is complete. On the receiving side of the API, the system has code to respond 
to requests for data from the export or encode component. The export component generally 
accesses the time, track number, and file or URL specified by the user, which are obtained 
from the timeline data structure. To the extent that data interpretation or project-specific 
settings are used by the encoder, this information also may be made available through an 
API. 

The video and audio may be encoded 1 100 separately using standard techniques. The 
table of contents and event tracks are then processed. In particular, a list of event assets is 
generated 1 102. An event asset is defined by its filename, track, and time in the presentation. 
The frame set is then accessed 1 104 to obtain a list of tracks and frame names. The items in 
the event tracks are then added to the streaming media file using the filename for the event 
and the frame name for the event, at the indicated time for the event, in 1 106. The filename 
for the event is its full path including either a full URL for remote files or an indicator of the 
disk volume for files that are accessed locally or over a local area network (LAN). In step 
1 106, the filenames and frame names inserted into the streaming media file are those in the 
destination to which the media file is being transferred. Therefore, the encoding is dependent 
in part on the transfer parameters. The list created in step 1 102 may be sorted or unsorted. 

Using Real Media, the table of contents track does not affect the streaming media file. 
Using Windows Media technology, however, marker codes are inserted for each table of 
contents entry, although no marker codes are inserted for events. 

Referring to Fig. 1 IB, an implementation using the Real Media encoder will now be 
described. A Real Media encoder 1 120 issues requests 1 122 for samples at a specified time. 
In response to these requests, a presentation processor 1 124 implements the process described 
in Fig. 1 1 A, and returns a sample 1 126 from an event that occurs in the presentation at a time 
closest to and after the requested time. The response 1 126 also indicates a time at which the 
encoder 1 120 should request the next sample. This time is the time corresponding to the 
sample which was returned by the presentation processor 1 124. The list of event assets 
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created in 1 102 in Fig. 1 1 A may be sorted prior to initiating encoding with the real media 
encoder 1 120, or may be sorted on the fly in response to requests 1 122 from the Real Media 
encoder 1 120. After the end of the presentation is reached, the encoded presentation 1 128 is 
available. 

The process of transferring data to the servers will now be described in connection 
with Fig. 11C. After setup and encoding have been completed, the transfer of the 
presentation starts with preparing 1 130 lists of files or resources of the presentation. A first 
list includes the table of contents file, the video frame file and the index or template file and 
all of the files that these three files directly reference. A second list is all files destined for 
the streaming media server. A third list is all of the files and resources in events and all of 
the files and resources these events reference directly. Resources that are not directly 
available at the local machine may be omitted from the list. This third list uses the complete 
path name or URL for the file or resource. For the drives or servers used for the files in the 
third list, a base path is found 1 132. New directories on the destination servers are then 
created 1 134 using the base paths as subdirectories of the target directory on the server. Files 
is all three lists are then transferred 1 136 to their respective destinations. 

A computer system with which the various elements of the system described above, 
either individually or in combination, may be implemented typically includes at least one 
main unit connected to both one or more output devices which store information, transmit 
information or display information to one or more users or machines and one or more input 
devices which receives input from one or more users or machines. The main unit may 
include one or more processors connected to a memory system via one or more 
interconnection mechanisms. Any input device and output device also are connected to the 
processor and memory system via the interconnection mechanism. 

The computer system may be a general purpose computer system which is 
programmable using a computer programming language. Computer programming languages 
suitable for implementing such a system include procedural programming languages, object- 
oriented programming languages, combinations of the two, or other languages. The 
computer system may also be specially programmed, special purpose hardware, or an 
application specific integrated circuit (ASIC). 

In a general purpose computer system, the processor is typically a commercially 
available processor which executes a program called an operating system which controls the 
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execution of other computer programs and provides scheduling, debugging, input/output 
control, accounting, compilation, storage assignment, data management and memory 
management, and communication control and related services. The processor and operating 
system defines computer platform for which application programs in other computer 
programming languages are written. The invention is not limited to any particular processor, 
operating system or programming language. 

A memory system typically includes a computer readable and writeable nonvolatile 
recording medium in which signals are stored that define a program to be executed by the 
processor or information stored on the disk to be processed by the program. Typically, in 
operation, the processor causes data to be read from the nonvolatile recording medium into 
another memory that allows for faster access to the information by the processor than does 
the disk. This memory is typically a volatile, random access memory such as a dynamic 
random access memory (DRAM) or static memory (SRAM). The processor generally 
manipulates the data within the integrated circuit memory and may copy the data to the disk 
if processing is completed. A variety of mechanisms are known for managing data 
movement between the disk and the integrated circuit memory element, and the invention is 
not limited thereto. The invention is not limited to a particular memory system. 

Such a system may be implemented in software or hardware or firmware, or any 
combination thereof. The various elements of this system, either individually or in 
combination, may be implemented as a computer program product including a computer- 
readable medium on which instructions are stored for access and execution by a processor. 
Various steps of the process may be performed by a computer processor executing 
instructions stored on a computer-readable medium to perform functions by operating on 
input and generating output. 

Additionally, the computer system may be a multiprocessor computer system or may 
include multiple computers connected over a computer network. Various possible 
configurations of computers in a network permit access to the system by multiple users using 
multiple instances of the programs even if they are dispersed geographically. Each program 
or step shown in the figures and the substeps or subparts shown in the figures may correspond 
to separate modules of a computer program, or may be separate computer programs. Such 
modules may be operable on one or more separate computers or other devices. The data 
produced by these components may be stored in a memory system or transmitted between 
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computer systems or devices. The plurality of computers or devices may be interconnected 
by a communication network, such as a public switched telephone network or other circuit 
switched network, or a packet switched network such as an Internet protocol (IP) network. 
The network may be wired or wireless, and may be public or private. 

A suitable platform for implementing software to provide such an authoring system 
includes a processor, operating system, a video capture device, a Creative Labs Sound Blaster 
or compatible sound card, CD-ROM drive, and 64 Megabytes of RAM minimum. For analog 
video capture, the video capture device may be the Osprey-100 PCI Video Capture Card or 
the Eskape My Capture II USB Video Capture Device. The processor may be a 230 
megahertz Pentium II or Pentium III processor, or Intel equivalent processor with MMX 
Technology, such as the AMD-K6-III, or Celeron Processor with 128K cache, and may be 
used with an operating system such as the Windows98/98SE or Millennium operating 
systems. For digital video capture, the video capture device may be an IEEE 1394 Port 
(OHCI compliant or Sony ILink). The processor may be a 450 megahertz Pentium II or 
Pentium III processor, or Intel equivalent processor with MMX Technology, such as the 
AMD-K6-III, or Celeron processor with 128K cache. 

Given an authoring tool such as described above, the use of multiple authoring tools 
by multiple authors for publishing data to a public or private computer network for access by 
other users will now be described in connection with Figs. 12 and 13. In particular, an 
encoded presentation 1200 and associated data files 1202 may be transferred by a transfer 
tool 1204 to a streaming media server 1206 and a data server 1208. The transfer tool also 
may store preference data 1210 for the author with a profile manager 1212. The streaming 
media server 1206 and data server 1208 may be publicly accessible web servers accessible by 
web browsers 1214. Other kinds of distributed libraries of digital media, instead of a web 
server, also may be used to publish the presentation. If additional transfer tools 1216 are used 
by other authors, these transfer tools 1216 may transfer the streaming media to the same or a 
different streaming media data server 1206 as the other transfer tool 1204, but may have a 
separate data server 1218. Use of the same streaming media data server is possible where 
each transfer tool has access to the streaming media server 1206. Such access may be built 
into either the transfer tool or the authoring tool. The transfer tool and/or the authoring tool 
may be provided by the same entity or another entity related to the entity that owns or 
distributes the streaming media server 1206. The streaming media server may be 
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implemented, for example, as described in U.S. Patent Application Serial No. 09/054,761, 
which corresponds to PCT Publication No. W099/34291. The streaming media server 1206 
may charge authors for access to and/or for the amount of data stored on the steaming media 
server 1206. 

In addition to publishing presentations to the media server, an authoring tool may use 
the media server or data server as a source of content for presentations. As shown in Fig. 13, 
for example, the editing system 1300, and optionally the transfer system 1302, may have 
access to one or more streaming servers 1304. The editing system may acquire stock footage 
1306 from the streaming media server 1304 or other content from a data server 1312. Such 
stock footage, for example, may be purchased from the entity maintaining or owning the 
streaming server 1304. An author may add such stock footage to the presentation. The 
completed presentation 1308 may be in turn published by the transfer system 1302 to the 
streaming media server 1304, with data files 1310 stored on a data server 1312. Tools used 
by other publishers and authors, as indicated at 1314, also may access the streaming server 
1304 for receiving stock footage or for publishing presentations. Such authors and publishers 
may use a separate data server 1316 for storing nontemporal data related to the temporal data 
published on the streaming server 1304. 

Having now described a few embodiments, it should be apparent to those skilled in 
the art that the foregoing is merely illustrative and not limiting, having been presented by way 
of example only. Numerous modifications and other embodiments are within the scope of 
the invention. 

What is claimed is: 



